Refactor Entropy Calculations with Averaging Over Timesteps, Robust State Handling, and Enhanced Eigenvalue Filtering #130

harryswift01 · 2025-08-14T15:50:30Z

Summary

This PR refactors and enhances the entropy computation to support averaging over timesteps, improve robustness against missing data, and streamline matrix operations. It also includes improvements to trajectory iteration, dihedral handling, and unit test coverage.

Changes

Implement Averaging Over Timesteps

Introduced running average logic for force and torque covariance matrices.
Updated build_covariance_matrices and update_force_torque_matrices to compute averages incrementally.
Removed post-loop summation and division, ensuring matrices are averaged during processing.

Improve Robustness in Entropy and Dihedral Computation

Refactored entropy computation to handle None and empty states safely.
Added type-safe checks for lists and NumPy arrays.
Skipped dihedral conformation assignment when no dihedrals are present.
Prevented unnecessary allocation and ensured entropy is only computed when valid states exist.

Refactor Trajectory Iteration

Replaced direct iteration with zip(index, frame) for consistent indexing.
Ensured final frame is not skipped during iteration.

Eigenvalue Filtering Enhancements

Filtered out complex and non-positive eigenvalues.
Converted valid values to real numbers and logged discarded ones.
Optimised filtering with a single-pass mask and summary warning.

Unit Test Improvements

Updated unit tests to reflect refactored logic.
Tidied up comments for clarity and maintainability.

Default Grouping Configuration

Changed default grouping from each to molecules for more intuitive behavior.

Impact

Improves numerical stability and correctness of entropy calculations.
Reduces memory usage by avoiding full-frame storage.
Enhances code readability and maintainability.
Provides more intuitive default behavior for users.

- Replaced per-frame overwrite and post-loop sum/division with incremental running average updates, matching the approach used in `waterEntropy`. - Added `frame_counts` tracking to compute averages on the fly without storing all frame data. - Updated `build_covariance_matrices` and `update_force_torque_matrices` to initialize averages on first frame and update them in place each step. - Ensures final returned matrices are already averaged and representative of all processed timesteps.

- Removed division by `n_frames` in `entropy.py` to retain running averages - Replaced `+=` with `=` in `get_matrices` to avoid accumulation across frames - Simplified force and torque matrix updates in `levels.py` for consistency

…nd set them to 0

…ues: - Added logging for complex and non-positive eigenvalues with index and value - Applied mask to filter out invalid values using np.isreal, np.isclose, and positivity check - Converted filtered values to pure real numbers using .real

- Replaced per-element logging with a single summary warning based on mask length - Improved performance and readability by avoiding redundant loops

…127-add-averaging-over-timesteps

- Replaced direct iteration over trajectory slices with `zip(index, frame)` - Ensures consistent indexing and avoids skipping final frame

… for `assign_conformation`: - Replaced direct iteration over trajectory slices with `zip(index, frame)` - Ensures consistent indexing and avoids skipping final frame

… dihedrals are present: - Added check to avoid calling `assign_conformation` if `get_dihedrals` returns an empty list - Prevented unnecessary allocation of empty `conformation` arrays - Ensured `states` are only generated when valid dihedrals exist - `conformational_entropy_calculation` is only called if the `state` is not empty

…127-add-averaging-over-timesteps

…afely: - Replaced ambiguous truth checks with type-safe logic for lists and NumPy arrays - Added guard against NoneType access for `states[group_id]` to prevent iteration errors

…her than `each`

jimboid

This PR has had a lot of eyeballing from the whole team whilst it has been being developed. Test data suggests the code changes here are good. A very hard fought addition from the team here, well done all!

jkalayan

Changes look great, looks like we have a working version of CodeEntropy :)

CodeEntropy/config/arg_config_manager.py

CodeEntropy/entropy.py

CodeEntropy/levels.py

skfegan · 2025-08-15T09:05:13Z

In entropy.py._process_conformational_entropy line 432 is:

group_states = states[group_id] if group_id < len(states) else None

I am not sure this is correct. In the grouping_molecules function (in group_molecules.py), the groups are dictionaries with the keys as the molecule number of the first molecule of that type. This means that the group_ids are all integers, but may not be consecutive. For grouping == "each" or most of our current examples it probably works, but if there are systems with multiples of molecules depending on how they are ordered in the topology weird things might happen if you try using the group_id as a counter rather than treating it as a label.

…interpret -1 as the last frame

… end is -1

…jectory_bounds`

harryswift01 · 2025-08-15T10:34:52Z

In entropy.py._process_conformational_entropy line 432 is:

group_states = states[group_id] if group_id < len(states) else None

I am not sure this is correct. In the grouping_molecules function (in group_molecules.py), the groups are dictionaries with the keys as the molecule number of the first molecule of that type. This means that the group_ids are all integers, but may not be consecutive. For grouping == "each" or most of our current examples it probably works, but if there are systems with multiples of molecules depending on how they are ordered in the topology weird things might happen if you try using the group_id as a counter rather than treating it as a label.

This was an issue relating the the -1 input of end, we have now ensured that if the input of -1 is used we correctly assign end as the length of the Universe trajectory, this will resolve the indexing issues we have previously highlighted.

harryswift01 added 15 commits August 8, 2025 14:34

Refactor matrix averaging logic for modularity:

1497258

- Removed division by `n_frames` in `entropy.py` to retain running averages - Replaced `+=` with `=` in `get_matrices` to avoid accumulation across frames - Simplified force and torque matrix updates in `levels.py` for consistency

refactor: warn on negative eigenvalues instead of halting with error

d3c85d0

Filter out the negative eigenvalues in the frequencies calculations a…

b516673

…nd set them to 0

Optimise eigenvalue filtering: single-pass mask with summary warning:

81bc1ef

- Replaced per-element logging with a single summary warning based on mask length - Improved performance and readability by avoiding redundant loops

Merge branch 'main' of https://github.com/CCPBioSim/CodeEntropy into …

6c20773

…127-add-averaging-over-timesteps

Refactor trajectory iteration to use explicit index pairing via zip():

4eb9e94

- Replaced direct iteration over trajectory slices with `zip(index, frame)` - Ensures consistent indexing and avoids skipping final frame

Refactor trajectory iteration to use explicit index pairing via zip()…

8575847

… for `assign_conformation`: - Replaced direct iteration over trajectory slices with `zip(index, frame)` - Ensures consistent indexing and avoids skipping final frame

Merge branch 'main' of https://github.com/CCPBioSim/CodeEntropy into …

ff1df47

…127-add-averaging-over-timesteps

Refactor entropy computation to handle missing and empty state data s…

206a3c3

…afely: - Replaced ambiguous truth checks with type-safe logic for lists and NumPy arrays - Added guard against NoneType access for `states[group_id]` to prevent iteration errors

fix and update unit tests to reflect changes made within this refactor

9d0c8a3

tidy up comments within the unit tests

5863bdf

change the default grouping configuration to group by molecules rat…

927d1aa

…her than `each`

harryswift01 added this to the 1.0.0 release milestone Aug 14, 2025

harryswift01 requested review from jimboid, jkalayan and skfegan August 14, 2025 15:50

harryswift01 self-assigned this Aug 14, 2025

harryswift01 linked an issue Aug 14, 2025 that may be closed by this pull request

Add averaging of force and torque matrices over timesteps #127

Closed

3 tasks

harryswift01 added the feature request New feature or request label Aug 14, 2025

jimboid approved these changes Aug 14, 2025

View reviewed changes

jkalayan approved these changes Aug 14, 2025

View reviewed changes

CodeEntropy/config/arg_config_manager.py Show resolved Hide resolved

CodeEntropy/entropy.py Show resolved Hide resolved

CodeEntropy/levels.py Show resolved Hide resolved

harryswift01 added 3 commits August 15, 2025 11:08

Fix trajectory bounds logic in _get_trajectory_bounds to correctly …

9ea4806

…interpret -1 as the last frame

Fix trajectory end index logic to ensure all frames are analyzed when…

35912a3

… end is -1

fix test_get_trajectory_bounds test to match new logic in `_get_tra…

700f6bb

…jectory_bounds`

harryswift01 merged commit bc10e47 into main Aug 15, 2025
7 checks passed

harryswift01 deleted the 127-add-averaging-over-timesteps branch August 15, 2025 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Entropy Calculations with Averaging Over Timesteps, Robust State Handling, and Enhanced Eigenvalue Filtering #130

Refactor Entropy Calculations with Averaging Over Timesteps, Robust State Handling, and Enhanced Eigenvalue Filtering #130

Uh oh!

harryswift01 commented Aug 14, 2025 •

edited

Loading

Uh oh!

jimboid left a comment

Uh oh!

jkalayan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skfegan commented Aug 15, 2025

Uh oh!

harryswift01 commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Refactor Entropy Calculations with Averaging Over Timesteps, Robust State Handling, and Enhanced Eigenvalue Filtering #130

Refactor Entropy Calculations with Averaging Over Timesteps, Robust State Handling, and Enhanced Eigenvalue Filtering #130

Uh oh!

Conversation

harryswift01 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Impact

Uh oh!

jimboid left a comment

Choose a reason for hiding this comment

Uh oh!

jkalayan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skfegan commented Aug 15, 2025

Uh oh!

harryswift01 commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

harryswift01 commented Aug 14, 2025 •

edited

Loading